Answers generally go above and beyond what I expect from you. They are meant to show you the correct answer, explain why it is correct, and potentially show several methods by which you can reach the answer.

The Popularity of Baby Names

Install and load the package babynames. Get help for ?babynames to see what the data includes.

# install.packages("babynames")

# Note I've “commented” out some of these commands  (with a #) so they do not run when I knit this document
# You should **never** install a package inside a .Rmd document, just do that in R Studio itself
# Of course, you do need to load everything with library() in a .Rmd document!

library(babynames)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# ?babynames()

Question 1

Part A

What are the top 5 boys names for 2017, and what percent of overall names is each?

# save as a new tibble
top_5_boys_2017 <- babynames %>% # take data
  filter(sex=="M", # filter by males
         year==2017) %>% # and for 2007
  arrange(desc(n)) %>% # arrange in largest-to-smallest order of n (number)
  slice(1:5) %>% # optional, look only at first 5 rows; head(., n=5) also works
  mutate(percent = round(prop*100, 2)) # also optional, make a percent variable rounded to 2 decimals

# look at our new tibble
top_5_boys_2017

The top 5 names are

top_5_boys_2017 %>%
  select(name,percent) %>%
  knitr::kable()
name percent
Liam 0.95
Noah 0.93
William 0.76
James 0.72
Logan 0.71

Alternatively, you could just write what you found manually into an object like:

top_5_boys_2017_alt <- c("Liam", "Noah", "William", "James", "Logan")

top_5_boys_2017_alt
## [1] "Liam"    "Noah"    "William" "James"   "Logan"
# you could alternatively add a command, 
# %>% pull(name) to the first chunk of code, 
# and it would do the same thing, but we'd want to save it, 
# for example:

top_5_boys_2017_alt <- babynames %>%
  filter(sex=="M",
         year==2017) %>%
  arrange(desc(n)) %>% 
  slice(1:5) %>%
  mutate(percent = round(prop*100, 2)) %>%
  pull(name)
  
top_5_boys_2017_alt
## [1] "Liam"    "Noah"    "William" "James"   "Logan"

Part B

What are the top 5 girls names for 2017, and what percent of overall names is each?

# save as a new tibble
top_5_girls_2017 <- babynames %>% # take data
  filter(sex=="F", # filter by females
         year==2017) %>% # and for 2007
  arrange(desc(n)) %>% # arrange in largest-to-smallest order of n (number)
  slice(1:5) %>% # optional, look only at first 5 rows; head(., n=5) also works
  mutate(percent = round(prop*100, 2)) # also optional, make a percent variable rounded to 2 decimals

# look at our new tibble
top_5_girls_2017

The top 5 names are

top_5_girls_2017 %>%
  select(name,percent) %>%
  knitr::kable()
name percent
Emma 1.05
Olivia 0.99
Ava 0.85
Isabella 0.81
Sophia 0.79

Alternatively, you could just write what you found manually into an object like:

top_5_girls_2017_alt <- c("Emma", "Olivia", "Ava", "Isabella", "Sophia")

Question 2

Make two barplots of these top 5 names, one for each sex. Map aesthetics x to name and y to prop [or percent, if you made that variable, as I did.] and use geom_col (since you are declaring a specific y, otherwise you could just use geom_bar() and just an x.)

ggplot(data = top_5_boys_2017)+
  aes(x = reorder(name, n), #note this reorders the x variable from small to large n
      y = percent, # you can use prop if you didn't make a percent variable
      fill = name)+ # optional color!
  geom_col()+
  
  # all of the above is sufficient, now I'm just making it pretty
  scale_y_continuous(labels = function(x){paste0(x, "%")}, # add percent signs
                     breaks = seq(from = 0, # make line breaks every 0.25%
                                  to = 1,
                                  by = 0.25),
                     limits = c(0,1), # limit axis to between 0 and 1
                     expand = c(0,0))+ # don't let it go beyond this
  labs(x = "Name",
       y = "Percent of All Babies With Name",
       title = "Most Popular Boys Names Since 1880",
       fill = "Boy's Name",
       caption = "Source: SSA")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed", base_size=16)+
  coord_flip()+ # flip axes to make horizontal!
  scale_fill_viridis_d(option = "default")+ # use viridis discrete color palette
  theme(legend.position = "") # hide legend
## Warning in viridisLite::viridis(n, alpha, begin, end, direction, option): Option
## 'default' does not exist. Defaulting to 'viridis'.

ggplot(data = top_5_girls_2017)+
  aes(x = reorder(name, n), #note this reorders the x variable from small to large n
      y = percent, # you can use prop if you didn't make a percent variable
      fill = name)+ # optional color!
  geom_col()+
  # all of the above is sufficient, now I'm just making it pretty
  scale_y_continuous(labels = function(x){paste0(x, "%")}, # add percent signs
                     breaks = seq(from = 0, # make line breaks every 0.25%
                                  to = 1.25,
                                  by = 0.25),
                     limits = c(0,1.3), # limit axis to between 0 and 1.2
                     expand = c(0,0))+ # don't let it go beyond this
  labs(x = "Name",
       y = "Percent of All Girls With Name",
       title = "Most Popular Girls Names Since 1880",
       fill = "Girl's Name",
       caption = "Source: SSA")+
  ggthemes::theme_pander(base_family = "Fira Sans Condensed", base_size=16)+
  coord_flip()+ # flip axes to make horizontal!
  scale_fill_viridis_d(option = "default")+ # use viridis discrete color palette
  theme(legend.position = "") # hide legend
## Warning in viridisLite::viridis(n, alpha, begin, end, direction, option): Option
## 'default' does not exist. Defaulting to 'viridis'.

If you had gone the alternate route by saving an object of names (like I did above with top_5_boys_2017_alt and top_5_girls_2017_alt), you could filter the data using the %in% operator to use for your data layer of each plot, like so:

boys_data <- babynames %>%
  filter(name %in% top_5_boys_2017_alt) # this will only use data for the 5 names

ggplot(data = boys_data) #+... the rest of the plot code above

Question 3

Find your name. [If your name isn’t in there :(, pick a random name.] count by sex how many babies since 1880 were named your name. [Hint: if you do this, you’ll get the number of rows (years) there are in the data. You want to add the number of babies in each row (n), so inside count, add wt = n to weight the count by n.] Also add a variable for the percent of each sex.

babynames %>%
  filter(name == "Ryan") %>%
  count(sex, wt=n) %>%
  mutate(percent = round((n/sum(n)*100),2))

Question 4

Make a line graph of the number of babies with your name over time, colored by sex.

# note here I'm going to wrangle the data and then pipe it directly into ggplot
# you can wrangle the data and save it as a different tibble, then use THAT tibble
# for your (data = ...) command in ggplot

# first wrangle data
babynames %>%
  filter(name == "Ryan") %>%

  # now we pipe into ggplot
  ggplot(data = .)+ # the "." is a placeholder for the stuff above!
  aes(x = year,
      y = n,
      color = sex)+
  geom_line(size = 1)+
  scale_color_manual(values = c("F" = "#e64173", # make my own colors
                                "M" = "#0047AB"))+
  labs(x = "Year",
       y = "Number of Babies",
       title = "Popularity of Babies Named 'Ryan'",
       color = "Sex",
       caption = "Source: SSA")+
    theme_classic(base_family = "Fira Sans Condensed", base_size=16)

Question 5

Part A

Find the most common name for boys by year between 1980-2017. [Hint: you’ll want to first group_by(year). Once you’ve got all the right conditions, you’ll get a table with a lot of data. You only want to slice(1) to keep just the 1st row of each year’s data.]

babynames %>%
  group_by(year) %>% # we want one observation per year
  filter(sex == "M",
         year>1979) %>% # or >==1980
  arrange(desc(n))%>% # start with largest n first
  slice(1) # take first row only

Part B

Now do the same for girls.

babynames %>%
  group_by(year) %>% # we want one observation per year
  filter(sex == "F",
         year>1979) %>% # or >==1980
  arrange(desc(n))%>% # start with largest n first
  slice(1) # take first row only

Question 6

Now let’s graph the evolution of the most common names since 1880.

Part A

First, find out what are the top 5 overall most popular names for boys and for girls in the data. [Hint: first group_by(name).] You may want to create two objects, each with these top 5 names as character elements.

babynames %>%
  group_by(name) %>% # we want one row per name
  filter(sex=="M") %>%
  summarize(total=sum(n)) %>% # add upp all of the n's for all years for each name
  arrange(desc(total)) %>% # list largest total first
  slice(1:5) 
# make a vector of the names (we'll need this for our graph below)
top_boys_names<-c("James", "John", "Robert", "Michael", "William")

# you could alternatively add a command, 
# %>% pull(name) to the first chunk of code, 
# and it would do the same thing, but we'd want to save it, 
# for example:

babynames %>%
  group_by(name) %>% # we want one row per name
  filter(sex=="M") %>%
  summarize(total=sum(n)) %>% # add upp all of the n's for all years for each name
  arrange(desc(total)) %>% # list largest total first
  slice(1:5) %>%
  pull(name)
## [1] "James"   "John"    "Robert"  "Michael" "William"
babynames %>%
  group_by(name) %>% # we want one row per name
  filter(sex=="F") %>%
  summarize(total=sum(n)) %>% # add upp all of the n's for all years for each name
  arrange(desc(total)) %>% # list largest total first
  slice(1:5)
# make a vector of the names (we'll need this for our graph below)
top_girls_names<-c("Mary", "Elizabeth", "Patricia", "Jennifer", "Linda")

Part B

Now make two linegraphs of these 5 names over time, one for boys, and one for girls. [Hint: you’ll first want to subset the data to use for your data layer in the plot. First group_by(year) and also make sure you only use the names you found in Part A. Try using the %in% command to do this.]

babynames %>%
  group_by(year) %>%
  filter(sex == "M",
         name %in% top_boys_names) %>%
  ggplot(data = .,
         aes(x = year,
             y = prop,
             color = name))+
  geom_line(size = 1)+
      labs(x = "Year",
         y = "Proportion of Babies with Name",
         title = "Most Popular Boys Names Since 1880",
         color = "Boy's Name",
         caption = "Source: SSA")+
    theme_classic(base_family = "Fira Sans Condensed", base_size=16)

babynames %>%
  group_by(year) %>%
  filter(sex == "F",
         name %in% top_girls_names) %>%
  ggplot(data = .,
         aes(x = year,
             y = prop,
             color = name))+
  geom_line(size = 1)+
    labs(x = "Year",
         y = "Proportion of Babies with Name",
         title = "Most Popular Girls Names Since 1880",
         color = "Girl's Name",
         caption = "Source: SSA")+
    theme_classic(base_family = "Fira Sans Condensed", base_size=16)

Question 7

Bonus (hard!): What are the 10 most common “gender-neutral” names? [This is hard to define. For our purposes, let’s define this as names where between 48% and 52% of the babies with the name are Male.]

There’s a lot to this, so I’ll break this up step by step and show you what happens at each major step.

We want to find the names where 48% to 52% of the babies with the name are male, as I defined in the footnote. First let’s mutate a variable to figure out how many babies with a particular name are male.

To do this, we’ll need to make a two variables to count the number of males and females of each name each year. We’ll use the ifelse() function for each:

  1. Make a male variable where, for each name in each year, if sex=="M", then count the number of males (n) that year, otherwise set it equal to 0.
  2. Make a female variable where, for each name in each year, if sex=="F", then count the number of females (n) that year, otherwise set it equal to 0.
babynames %>%
  mutate(male = ifelse(sex == "M", n, 0),
         female = ifelse(sex == "F", n, 0))

Now with this variable, we want to count the total number of males and females with each name over the entire dataset. Let’s first group_by(name) so we’ll get one row for every name. We will summarize() and take the sum of our male and of our female variables.

babynames %>%
  mutate(male = ifelse(sex == "M", n, 0),
         female = ifelse(sex == "F", n, 0)) %>%
  group_by(name) %>%
    summarize(Male = sum(male),
              Female = sum(female))

Now, we want to figure out what fraction of each name is Male or Female. It doesn’t matter which we do here, I’ll do Male. mutate() a new variable I’ll call perc_male for the percent of the name being for Male babies. It takes the summed variables we made before, and takes the fraction that are Male, multiplying by 100 to get percents (which isn’t necessary, but is easy to read).

babynames %>%
  mutate(male = ifelse(sex == "M", n, 0),
         female = ifelse(sex == "F", n, 0)) %>%
  group_by(name) %>%
    summarize(Male = sum(male),
              Female = sum(female))%>%
  mutate(perc_male = (Male/(Male+Female)*100))

Right now, it’s still in alphabetical order. We want to arrange it by perc_male, and more importantly, we want perc_male to be between 48 and 52, so let’s filter accordingly:

babynames %>%
  mutate(male = ifelse(sex == "M", n, 0),
         female = ifelse(sex == "F", n, 0)) %>%
  group_by(name) %>%
    summarize(Male = sum(male),
              Female = sum(female))%>%
  mutate(perc_male = (Male/(Male+Female)*100)) %>%
  arrange(perc_male) %>%
  filter(perc_male > 48,
         perc_male < 52)

This gives us a lot of names, all falling between 48% and 52% male. But we want the most popular names that are in this range. So let’s finally mutate a new variable called total that simply adds the number of Male and Female babies with a name. Then let’s arrange our results by desc(total) to get the largest first, and then slice(1:10) to get the top 10 only.

babynames %>%
  mutate(male = ifelse(sex == "M", n, 0),
         female = ifelse(sex == "F", n, 0)) %>%
  group_by(name) %>%
    summarize(Male = sum(male),
              Female = sum(female))%>%
  mutate(perc_male = (Male/(Male+Female)*100)) %>%
  arrange(perc_male) %>%
  filter(perc_male > 48,
         perc_male < 52) %>%
  mutate(total = Male+Female) %>%
  arrange(desc(total)) %>%
  slice(1:10)

Political and Economic Freedom Around the World

For the remaining questions, we’ll look at the relationship between Economic Freedom and Political Freedom in countries around the world today. Our data for economic freedom comes from the Fraser Institute, and our data for political freedom comes from Freedom House.

Question 8

Download these two datasets that I’ve cleaned up a bit: [If you want a challenge, try downloading them from the websites and cleaning them up yourself!]

Below is a brief description of the variables I’ve put in each dataset:

Econ Freedom

Variable Description
year Year
ISO Three-letter country code
country Name of the country
ef_index Total economic freedom index (0 - least to 100 - most)
rank Rank of the country in terms of economic freedom
continent Continent the country is in

Pol Freedom

Variable Description
country Name of the country
C/T Whether the location is a country (C) or territory (T)
year Year
status Whether the location is Free (F), Partly Free (F) or Not Free (NF)
fh_score Total political freedom index (0 - least to 100 - most)

Import and save them each as an object using my_df_name <- read_csv("name_of_the_file.csv"). I suggest one as econ and the other as pol, but it’s up to you. Look at each object you’ve created.

# import data with read_csv from readr

# note these file paths assume you have a folder called "data" in your working directory

# if you used an R project and did just that (or downloaded my R Project from the website)
# then you already have this done

econ<-read_csv("data/econ_freedom.csv")
## Rows: 4050 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): ISO, country, continent
## dbl (3): year, ef_index, rank
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
pol<-read_csv("data/pol_freedom.csv")
## Rows: 1885 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): country, C/T, status
## dbl (2): year, fh_score
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# look at each dataframe
econ
pol

Question 9

Now let’s join them together so that we can have a single dataset to work with. You can learn more about this in the 1.4 slides. Since both datasets have both country and year (spelled exactly the same in both!), we can use these two variables as a key to combine observations. Run the following code (substituting whatever you want to name your objects):

freedom <- left_join(econ, pol, by=c("country", "year")

Take a look at freedom to make sure it appears to have worked.

freedom <- left_join(econ, pol, by=c("country", "year"))
freedom

Question 11

Part A

Make a barplot of the 10 countries with the highest Economic Freedom index score in 2018. You may want to find this first and save it as an object for your plot’s data layer. Use geom_col() since we will map ef_index to y. If you want to order the bars, set x = fct_reorder(ISO, desc(ef_index)) to reorder ISO (or country, if you prefer) by EF score in descending order.

# grab the top 10 countries by ef in 2018
ef_10<-freedom %>%
  filter(year == 2018) %>%
  arrange(desc(ef_index)) %>%
  slice(1:10)

# look at it just to check
ef_10
# now plot it
ggplot(data = ef_10)+
  aes(x = fct_reorder(ISO, desc(ef_index)), # reorder ISO by ef in order
      y = ef_index)+
  geom_col(aes(fill = continent))+ # coloring is optional
  # above is sufficient, now let's just make it prettier
  geom_text(aes(label = ef_index), # add the score onto the bar
            vjust = 1.2, # adjust it vertically
            color = "white"
            )+
  scale_y_continuous(breaks = seq(0,10,2),
                     limits = c(0,10),
                     expand = c(0,0)
                     )+
  labs(x = "Country",
       y = "Economic Freedom Score",
       title = "Top 10 Countries by Economic Freedom",
       caption = "Source: Frasier Institute",
       fill = "Continent")+
  theme_minimal(base_family = "Fira Sans Condensed")+
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = rel(1.5))
        )

Part B

Make a barplot of the 10 countries with the highest Freedom House index score in 2018, similar to what you did for Part A.

# grab the top 10 countries by fh in 2018
pf_10<-freedom %>%
  filter(year == 2018) %>%
  arrange(desc(fh_score)) %>%
  slice(1:10)

# look at it just to check
pf_10
# now plot it
ggplot(data = pf_10)+
  aes(x = fct_reorder(ISO, desc(fh_score)),
      y = fh_score)+
  geom_col(aes(fill = continent))+ # coloring is optional
  # above is sufficient, now let's just make it prettier
  geom_text(aes(label = fh_score), # add the score onto the bar
            vjust = 1.2, # adjust it vertically
            color = "white")+
  scale_y_continuous(breaks = seq(0,100,20),
                     limits = c(0,100),
                     expand = c(0,0))+
  labs(x = "Country",
       y = "Political Freedom Score",
       title = "Top 10 Countries by Political Freedom",
       caption = "Source: Freedom House",
       fill = "Continent")+
  theme_minimal(base_family = "Fira Sans Condensed")+
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = rel(1.5))
        )

Question 11

Now make a scatterplot of Political freedom (fh_score as y) on Economic Freedom (ef_index as x) and color by continent.

ggplot(data = freedom)+
  aes(x = ef_index,
      y = fh_score)+
  # doing just geom_point() is fine, but since there's a lot of overlap, here are some things I like to do:
  geom_point(aes(fill = continent), # fill the points with color by continent
             alpha = 0.9, # make points slightly transparent
             color = "white", # outline the points with a white border
             pch = 21, # this shape has an outline and a fill color
             size = 3)+
  scale_x_continuous(breaks = seq(0,10,2),
                     limits = c(0,10),
                     expand = c(0,0))+
  scale_y_continuous(breaks = seq(0,100,20),
                     limits = c(0,105),
                     expand = c(0,0))+
  labs(x = "Economic Freedom Score",
       y = "Political Freedom Score",
       caption = "Sources: Frasier Institute, Freedom House",
       title = "Economic Freedom & Political Freedom",
       fill = "Continent")+
  theme_minimal(base_family = "Fira Sans Condensed")+
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = rel(1.5))
        )
## Warning: Removed 3166 rows containing missing values (geom_point).

Note, I meant to ask you to look at one year only, e.g. 2018. We would just have to filter first:

# save as p
p <- freedom %>%
  filter(year == "2018") %>%
ggplot(data = .)+
  aes(x = ef_index,
      y = fh_score)+
  geom_point(aes(fill = continent),
             alpha = 0.9, 
             color = "white",
             pch = 21, 
             size = 3)+
  scale_x_continuous(breaks = seq(0,10,2),
                     limits = c(0,10),
                     expand = c(0,0))+
  scale_y_continuous(breaks = seq(0,100,20),
                     limits = c(0,105),
                     expand = c(0,0))+
  labs(x = "Economic Freedom Score",
       y = "Political Freedom Score",
       caption = "Sources: Frasier Institute, Freedom House",
       title = "Economic Freedom & Political Freedom",
       fill = "Continent")+
  theme_minimal(base_family = "Fira Sans Condensed")+
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = rel(1.5))
        )

# look at it
p
## Warning: Removed 13 rows containing missing values (geom_point).

Question 12

Save your plot from Question 11 as an object, and add a new layer where we will highlight a few countries. Pick a few countries (I suggest using the ISO code) and create a new object filtering the data to only include these countries (again the %in% command will be most helpful here).

Additionally, install and load a package called "ggrepel", which will adjust labels so they do not overlap on a plot.

Then, add the following layer to your plot:

geom_label_repel(data = countries, # or whatever object name you created
                     aes(x = ef_index,
                         y = fh_score,
                         label = ISO, # show ISO as label (you could do country instead)
                         color = continent),
                     alpha = 0.5, # make it a bit transparent
                     box.padding = 0.75, # control how far labels are from points
                     show.legend = F) # don't want this to add to the legend

This should highlight these countries on your plot.

library(ggrepel)

# pick some countries
some_countries <- freedom %>%
  filter(year==2018,
         country %in% c("United States",
                        "United Kingdom",
                        "Sweden",
                        "China",
                        "Singapore",
                        "Russian Federation",
                        "Korea, Rep.",
                        "Hong Kong SAR, China"))
# add layer 
p + geom_label_repel(data = some_countries, # or whatever object name you created
                     aes(x = ef_index,
                         y = fh_score,
                         label = ISO, # show ISO as label (you could do country instead)
                         color = continent),
                     alpha = 0.75, # make it a bit transparent
                     box.padding = 0.75, # control how far labels are from points
                     show.legend = F) # don't want this to add to the legend
## Warning: Removed 13 rows containing missing values (geom_point).

Question 13

Let’s just look only at the United States and see how it has fared in both measures of freedom over time. filter() the data to look only at ISO == "USA". Use both a geom_point() layer and a geom_path() layer, which will connect the dots over time. Let’s also see this by labeling the years with an additional layer geom_text_repel(aes(label = year)).

# save plot as us
us<-freedom %>%
  filter(ISO == "USA") %>%
ggplot(data = .)+
  aes(x = ef_index,
      y = fh_score)+
  geom_point(color = "red")+
  geom_path(color = "red")+
  geom_text_repel(aes(label = year),
                  color = "red")+
  scale_x_continuous(breaks = seq(8,8.5,0.05),
                     limits = c(8,8.5),
                     expand = c(0,0))+
  scale_y_continuous(breaks = seq(85,95,1),
                     limits = c(85,95),
                     expand = c(0,0))+
  labs(x = "Economic Freedom Score",
       y = "Political Freedom Score",
       caption = "Sources: Frasier Institute, Freedom House",
       title = "U.S. Political & Economic Freedom, 2013—2018",
       fill = "Continent")+
  theme_minimal(base_family = "Fira Sans Condensed")+
  theme(legend.position = "bottom",
        plot.title = element_text(face = "bold", size = rel(1.5))
        )
# look at it
us
## Warning: Removed 19 rows containing missing values (geom_point).
## Warning: Removed 19 row(s) containing missing values (geom_path).
## Warning: Removed 19 rows containing missing values (geom_text_repel).

Note that the way I zoomed in on the scales, these look like pretty dramatic changes!

If we maintain the full perspective, the change appears minor. Be very careful how you present your analysis!

us+
  scale_x_continuous(breaks = seq(0,10,1),
                     limits = c(0,10),
                     expand = c(0,0)
                     )+
  scale_y_continuous(breaks = seq(0,100,10),
                     limits = c(0,100),
                     expand = c(0,0)
                     )
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
## Warning: Removed 19 rows containing missing values (geom_point).
## Warning: Removed 19 row(s) containing missing values (geom_path).
## Warning: Removed 19 rows containing missing values (geom_text_repel).